Goto

Collaborating Authors

 acceptance probability


Algorithmic warm starts for Hamiltonian Monte Carlo

Zhang, Matthew S., Altschuler, Jason M., Chewi, Sinho

arXiv.org Machine Learning

Generating samples from a continuous probability density is a central algorithmic problem across statistics, engineering, and the sciences. For high-dimensional settings, Hamiltonian Monte Carlo (HMC) is the default algorithm across mainstream software packages. However, despite the extensive line of work on HMC and its widespread empirical success, it remains unclear how many iterations of HMC are required as a function of the dimension $d$. On one hand, a variety of results show that Metropolized HMC converges in $O(d^{1/4})$ iterations from a warm start close to stationarity. On the other hand, Metropolized HMC is significantly slower without a warm start, e.g., requiring $Ω(d^{1/2})$ iterations even for simple target distributions such as isotropic Gaussians. Finding a warm start is therefore the computational bottleneck for HMC. We resolve this issue for the well-studied setting of sampling from a probability distribution satisfying strong log-concavity (or isoperimetry) and third-order derivative bounds. We prove that \emph{non-Metropolized} HMC generates a warm start in $\tilde{O}(d^{1/4})$ iterations, after which we can exploit the warm start using Metropolized HMC. Our final complexity of $\tilde{O}(d^{1/4})$ is the fastest algorithm for high-accuracy sampling under these assumptions, improving over the prior best of $\tilde{O}(d^{1/2})$. This closes the long line of work on the dimensional complexity of MHMC for such settings, and also provides a simple warm-start prescription for practical implementations.







Fixed-Distance Hamiltonian Monte Carlo

Neural Information Processing Systems

Markov chain Monte Carlo (MCMC) is an inference mechanism that approximates a target probability distribution by a sequence of states (a.k.a.



Some aspects of robustness in modern Markov Chain Monte Carlo

Power, Sam, Vasdekis, Giorgos

arXiv.org Machine Learning

Markov Chain Monte Carlo (MCMC) is a flexible approach to approximate sampling from intractable probability distributions, with a rich theoretical foundation and comprising a wealth of exemplar algorithms. While the qualitative correctness of MCMC algorithms is often easy to ensure, their practical efficiency is contingent on the `target' distribution being reasonably well-behaved. In this work, we concern ourself with the scenario in which this good behaviour is called into question, reviewing an emerging line of work on `robust' MCMC algorithms which can perform acceptably even in the face of certain pathologies. We focus on two particular pathologies which, while simple, can already have dramatic effects on standard `local' algorithms. The first is roughness, whereby the target distribution varies so rapidly that the numerical stability of the algorithm is tenuous. The second is flatness, whereby the landscape of the target distribution is instead so barren and uninformative that one becomes lost in uninteresting parts of the state space. In each case, we formulate the pathology in concrete terms, review a range of proposed algorithmic remedies to the pathology, and outline promising directions for future research.


Polynomial Mixing Times of Simulated Tempering for Mixture Targets by Conductance Decomposition

Zhou, Quan

arXiv.org Machine Learning

We study the theoretical complexity of simulated tempering for sampling from mixtures of log-concave components differing only by location shifts. The main result establishes the first polynomial-time guarantee for simulated tempering combined with the Metropolis-adjusted Langevin algorithm (MALA) with respect to the problem dimension $d$, maximum mode displacement $D$, and logarithmic accuracy $\log ε^{-1}$. The proof builds on a general state decomposition theorem for $s$-conductance, applied to an auxiliary Markov chain constructed on an augmented space. We also obtain an improved complexity estimate for simulated tempering combined with random-walk Metropolis. Our bounds assume an inverse-temperature ladder with smallest value $β_1 = O(D^{-2})$ and spacing $β_{i+1}/β_i = 1 + O( d^{-1/2} )$, both of which are shown to be asymptotically optimal up to logarithmic factors.